LLM, Agent, and Framework Standards Guide

Overview
Why Standards Matter
Tool Calling Standards
Agent Communication Protocols
Prompt Format Standards
API Standards
Evaluation Standards
Interoperability Standards
Emerging Standards
Best Practices

Overview

As the AI ecosystem matures, standardization becomes critical for interoperability, portability, and ecosystem growth. This guide covers the major standards, protocols, and conventions used across LLMs, agents, and frameworks.

Key Standards Bodies: - OpenAI (de facto standards through API design) - Anthropic (Claude API standards) - Model Context Protocol (MCP) Consortium - OpenAPI Initiative - W3C (potential future involvement) - Linux Foundation AI & Data

Why Standards Matter

The Problem Without Standards

┌─────────────────────────────────────────────────────────────┐
│  WITHOUT STANDARDS: Fragmentation                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  App A → Custom Format → Model X                            │
│  App B → Different Format → Model Y                         │
│  App C → Another Format → Model Z                           │
│                                                              │
│  Result:                                                     │
│  ❌ N × M integrations (every app × every model)           │
│  ❌ No portability                                          │
│  ❌ Vendor lock-in                                          │
│  ❌ Duplicate effort                                        │
│  ❌ Slow innovation                                         │
└─────────────────────────────────────────────────────────────┘

The Solution With Standards

┌─────────────────────────────────────────────────────────────┐
│  WITH STANDARDS: Interoperability                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  App A ─┐                                                   │
│  App B ─┼─→ Standard Protocol → Any Model                  │
│  App C ─┘                                                   │
│                                                              │
│  Result:                                                     │
│  ✅ Write once, use everywhere                             │
│  ✅ Easy model switching                                   │
│  ✅ No vendor lock-in                                      │
│  ✅ Ecosystem growth                                       │
│  ✅ Faster innovation                                      │
└─────────────────────────────────────────────────────────────┘

Tool Calling Standards

1. OpenAI Function Calling Standard

Status: De facto industry standard
Adoption: OpenAI, Azure OpenAI, many open-source models
Specification: https://platform.openai.com/docs/guides/function-calling

Format:

{
  "model": "gpt-4",
  "messages": [
    {"role": "user", "content": "What's the weather in Paris?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name, e.g. Paris"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Response Format:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"Paris\", \"unit\": \"celsius\"}"
            }
          }
        ]
      }
    }
  ]
}

Key Features: - JSON Schema for parameter validation - Multiple tool calls in single response - Tool choice control (auto, required, none) - Unique call IDs for tracking

Adoption: - ✅ OpenAI GPT-3.5, GPT-4 - ✅ Azure OpenAI - ✅ Many open-source models (via adapters) - ✅ LangChain, LlamaIndex support

2. Anthropic Tool Use Standard

Status: Claude-specific, growing adoption
Adoption: Anthropic Claude, AWS Bedrock (Claude)
Specification: https://docs.anthropic.com/claude/docs/tool-use

Format:

{
  "model": "claude-3-opus-20240229",
  "max_tokens": 1024,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "input_schema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City name"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ],
  "messages": [
    {"role": "user", "content": "What's the weather in Paris?"}
  ]
}

Response Format:

{
  "content": [
    {
      "type": "tool_use",
      "id": "toolu_01A09q90qw90lq917835lq9",
      "name": "get_weather",
      "input": {
        "location": "Paris",
        "unit": "celsius"
      }
    }
  ],
  "stop_reason": "tool_use"
}

Key Differences from OpenAI: - Uses input_schema instead of parameters - Tool calls in content array (not separate field) - Different ID format - More flexible content blocks

Adoption: - ✅ Anthropic Claude (all versions) - ✅ AWS Bedrock (Claude models) - ✅ LangChain, LangGraph support - ⚠️ Requires adaptation for other models

3. Model Context Protocol (MCP) Tool Standard

Status: Emerging standard (Nov 2024)
Adoption: Anthropic, growing ecosystem
Specification: https://modelcontextprotocol.io/

Format:

{
  "jsonrpc": "2.0",
  "method": "tools/list",
  "id": 1
}

Response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "tools": [
      {
        "name": "get_weather",
        "description": "Get current weather",
        "inputSchema": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    ]
  }
}

Tool Invocation:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_weather",
    "arguments": {
      "location": "Paris"
    }
  },
  "id": 2
}

Key Features: - JSON-RPC 2.0 protocol - Standardized discovery mechanism - Server-client architecture - Transport agnostic (STDIO, HTTP/SSE)

Adoption: - ✅ Anthropic Claude Desktop - ✅ Growing MCP server ecosystem - 🔄 Early adoption phase - 🔄 Framework integration in progress

Comparison Matrix

Feature	OpenAI	Anthropic	MCP
Format	JSON	JSON	JSON-RPC 2.0
Schema	JSON Schema	JSON Schema	JSON Schema
Discovery	Static	Static	Dynamic
Transport	HTTP	HTTP	STDIO/HTTP/SSE
Multi-call	Yes	Yes	Yes
Streaming	Yes	Yes	Yes
Adoption	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐ (new)

Agent Communication Protocols

1. ReAct (Reasoning + Acting) Pattern

Status: De facto standard for agent reasoning
Paper: https://arxiv.org/abs/2210.03629

Format:

Thought: I need to find the weather in Paris
Action: get_weather(location="Paris")
Observation: Temperature is 22°C, sunny
Thought: I have the information needed
Answer: The weather in Paris is 22°C and sunny.

Structured Format:

{
  "thought": "I need to find the weather in Paris",
  "action": {
    "tool": "get_weather",
    "parameters": {"location": "Paris"}
  },
  "observation": "Temperature is 22°C, sunny",
  "answer": "The weather in Paris is 22°C and sunny."
}

Adoption: - ✅ LangChain agents - ✅ LangGraph - ✅ AutoGen - ✅ Most agent frameworks

2. Agent Protocol (by AI Engineer Foundation)

Status: Emerging standard
Website: https://agentprotocol.ai/
GitHub: https://github.com/AI-Engineer-Foundation/agent-protocol

Purpose: Standardize agent-to-agent and human-to-agent communication

API Endpoints:

POST   /agent/tasks          # Create task
GET    /agent/tasks/{id}     # Get task status
POST   /agent/tasks/{id}/steps  # Execute step
GET    /agent/tasks/{id}/steps  # List steps

Task Format:

{
  "input": "Analyze sales data and create report",
  "additional_input": {
    "data_source": "s3://bucket/data.csv"
  }
}

Response:

{
  "task_id": "task_123",
  "status": "running",
  "steps": [
    {
      "step_id": "step_1",
      "name": "Read data",
      "status": "completed"
    },
    {
      "step_id": "step_2",
      "name": "Analyze",
      "status": "running"
    }
  ]
}

Adoption: - 🔄 Early adoption - 🔄 Framework integration in progress - ✅ AutoGPT support

3. Multi-Agent Communication Standards

Patterns:

Broadcast Pattern

{
  "from": "agent_coordinator",
  "to": ["agent_1", "agent_2", "agent_3"],
  "type": "broadcast",
  "message": "Start processing task X"
}

Request-Response Pattern

{
  "from": "agent_1",
  "to": "agent_2",
  "type": "request",
  "request_id": "req_123",
  "action": "analyze_data",
  "data": {...}
}

{
  "topic": "task_completed",
  "publisher": "agent_1",
  "data": {
    "task_id": "task_123",
    "result": {...}
  }
}

Prompt Format Standards

1. ChatML (Chat Markup Language)

Status: OpenAI standard
Format:

<|im_start|>system
You are a helpful assistant.
<|im_end|>
<|im_start|>user
What's the weather?
<|im_end|>
<|im_start|>assistant

JSON Representation:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather?"},
    {"role": "assistant", "content": ""}
  ]
}

Adoption: - ✅ OpenAI models - ✅ Many open-source models - ✅ Standard in fine-tuning

2. Anthropic Message Format

Format:

{
  "system": "You are a helpful assistant.",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's the weather?"}
      ]
    }
  ]
}

Multi-modal:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/jpeg",
            "data": "..."
          }
        },
        {"type": "text", "text": "What's in this image?"}
      ]
    }
  ]
}

3. Llama Format

Format:

<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

What's the weather? [/INST]

Adoption: - ✅ Meta Llama models - ✅ Many Llama-based models

API Standards

1. OpenAI API Standard

Status: Industry standard
Base URL: https://api.openai.com/v1

Endpoints:

POST /chat/completions      # Chat completion
POST /completions           # Text completion
POST /embeddings            # Generate embeddings
POST /images/generations    # Image generation

Request Format:

{
  "model": "gpt-4",
  "messages": [...],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Adoption: - ✅ OpenAI - ✅ Azure OpenAI - ✅ Many compatible APIs (Anthropic, Cohere, etc.) - ✅ LiteLLM (unified interface)

2. OpenAPI/Swagger for Tool Definitions

Status: Standard for API documentation
Specification: https://swagger.io/specification/

Example:

openapi: 3.0.0
info:
  title: Weather API
  version: 1.0.0
paths:
  /weather:
    get:
      summary: Get current weather
      parameters:
        - name: location
          in: query
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Weather data
          content:
            application/json:
              schema:
                type: object
                properties:
                  temperature:
                    type: number
                  condition:
                    type: string

Usage in Agents: - ✅ AWS Bedrock Agents (action groups) - ✅ LangChain tools - ✅ API Gateway integration

Evaluation Standards

1. HELM (Holistic Evaluation of Language Models)

Organization: Stanford CRFM
Website: https://crfm.stanford.edu/helm/

Metrics: - Accuracy - Calibration - Robustness - Fairness - Bias - Toxicity - Efficiency

Scenarios: - Question answering - Information retrieval - Summarization - Sentiment analysis - Toxicity detection

2. MMLU (Massive Multitask Language Understanding)

Paper: https://arxiv.org/abs/2009.03300

Coverage: - 57 subjects - STEM, humanities, social sciences - Elementary to professional level

Standard Benchmark: - Used by all major LLM providers - Reported in model cards

3. HumanEval (Code Generation)

Paper: https://arxiv.org/abs/2107.03374
Dataset: https://github.com/openai/human-eval

Format: - 164 programming problems - Function signature + docstring - Unit tests for verification

Adoption: - ✅ Standard for code models - ✅ Used by OpenAI, Anthropic, Google

4. TruthfulQA

Paper: https://arxiv.org/abs/2109.07958

Purpose: Measure truthfulness and reduce hallucinations

Categories: - Health - Law - Finance - Politics

Interoperability Standards

1. ONNX (Open Neural Network Exchange)

Organization: Linux Foundation
Website: https://onnx.ai/

Purpose: Model format interoperability

Support: - PyTorch → ONNX - TensorFlow → ONNX - ONNX → Various runtimes

2. Hugging Face Model Hub Standard

Website: https://huggingface.co/

Standard Components: - Model card (README.md) - Config.json - Tokenizer files - Model weights

Model Card Format:

---
language: en
license: apache-2.0
tags:
  - text-generation
  - llm
datasets:
  - common_crawl
metrics:
  - perplexity
---

# Model Description
...

3. LangChain Standard Components

Abstractions:

# Standard interfaces
class BaseLanguageModel:
    def invoke(self, input: str) -> str: ...
    def stream(self, input: str) -> Iterator[str]: ...

class BaseTool:
    name: str
    description: str
    def run(self, input: str) -> str: ...

class BaseRetriever:
    def get_relevant_documents(self, query: str) -> List[Document]: ...

Adoption: - ✅ LangChain ecosystem - ✅ LangGraph - ✅ Many frameworks adopt similar patterns

Emerging Standards

1. OpenAI Assistants API

Status: Emerging
Documentation: https://platform.openai.com/docs/assistants/overview

Features: - Persistent threads - Built-in tools (code interpreter, retrieval) - File handling

Format:

{
  "assistant_id": "asst_abc123",
  "thread_id": "thread_abc123",
  "message": "Analyze this data"
}

2. Semantic Kernel Standard Plugins

Organization: Microsoft
Website: https://learn.microsoft.com/en-us/semantic-kernel/

Plugin Format:

[KernelFunction]
[Description("Get weather for a location")]
public async Task<string> GetWeather(
    [Description("City name")] string location
)
{
    // Implementation
}

3. LangGraph State Schema

Format:

from typing import TypedDict

class AgentState(TypedDict):
    messages: list
    next_action: str
    data: dict

Standard State Management: - Typed state definitions - State persistence - State versioning

Best Practices

1. Choose Standards Based on Ecosystem

AWS Ecosystem: - ✅ Use Anthropic format for Claude - ✅ Use OpenAPI for Bedrock Agents - ✅ Consider MCP for tool integration

OpenAI Ecosystem: - ✅ Use OpenAI function calling - ✅ Use ChatML format - ✅ Follow OpenAI API conventions

Multi-Provider: - ✅ Use LiteLLM for unified interface - ✅ Abstract tool definitions - ✅ Use MCP for portability

2. Version Your Schemas

{
  "schema_version": "1.0",
  "tool": {
    "name": "get_weather",
    "version": "2.0",
    "parameters": {...}
  }
}

3. Document Deviations

When you deviate from standards, document why:

# Note: Using custom format instead of OpenAI standard
# Reason: Need additional metadata not supported by standard
# Migration path: Will adopt standard when feature is added

4. Use Adapters for Compatibility

class ToolAdapter:
    """Adapt between OpenAI and Anthropic formats"""

    @staticmethod
    def openai_to_anthropic(openai_tool):
        return {
            "name": openai_tool["function"]["name"],
            "description": openai_tool["function"]["description"],
            "input_schema": openai_tool["function"]["parameters"]
        }

5. Test Against Multiple Standards

def test_tool_compatibility():
    tool = MyTool()

    # Test OpenAI format
    assert validate_openai_format(tool.to_openai())

    # Test Anthropic format
    assert validate_anthropic_format(tool.to_anthropic())

    # Test MCP format
    assert validate_mcp_format(tool.to_mcp())

Standard Adoption Timeline

2020: OpenAI API becomes de facto standard
2021: Hugging Face model hub standardization
2022: ReAct pattern published
2023: OpenAI function calling standard
2023: Anthropic tool use format
2024: Model Context Protocol (MCP) announced
2024: Agent Protocol specification
2025: Convergence toward unified standards (ongoing)

Future Directions

Likely Developments:

Unified Tool Calling Standard
- Convergence of OpenAI and Anthropic formats
- MCP adoption grows
Agent Communication Protocol
- Standardized multi-agent communication
- Cross-framework agent collaboration
Evaluation Standards
- More comprehensive benchmarks
- Domain-specific evaluation suites
Safety Standards
- Standardized guardrails
- Content filtering protocols
- Bias measurement standards
Observability Standards
- Standardized tracing formats
- Common metrics definitions
- Debugging protocols

Resources

Standards Organizations

OpenAI: https://platform.openai.com/docs/
Anthropic: https://docs.anthropic.com/
MCP: https://modelcontextprotocol.io/
OpenAPI Initiative: https://www.openapis.org/
Linux Foundation AI: https://lfaidata.foundation/

Specifications

OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling
Anthropic Tool Use: https://docs.anthropic.com/claude/docs/tool-use
MCP Specification: https://spec.modelcontextprotocol.io/
OpenAPI 3.0: https://swagger.io/specification/
JSON-RPC 2.0: https://www.jsonrpc.org/specification

Benchmarks

HELM: https://crfm.stanford.edu/helm/
Open LLM Leaderboard: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard
Chatbot Arena: https://chat.lmsys.org/

Last Updated: January 2026

Note: Standards in the AI/LLM space are rapidly evolving. This document reflects the current state but will require regular updates as the ecosystem matures.

LLM, Agent, and Framework Standards Guide

Table of Contents

Overview

Why Standards Matter

The Problem Without Standards

The Solution With Standards

Tool Calling Standards

1. OpenAI Function Calling Standard

2. Anthropic Tool Use Standard

3. Model Context Protocol (MCP) Tool Standard

Comparison Matrix

Agent Communication Protocols

1. ReAct (Reasoning + Acting) Pattern

2. Agent Protocol (by AI Engineer Foundation)

3. Multi-Agent Communication Standards

Broadcast Pattern

Request-Response Pattern

Publish-Subscribe Pattern

Prompt Format Standards

1. ChatML (Chat Markup Language)

2. Anthropic Message Format

3. Llama Format

API Standards

1. OpenAI API Standard

2. OpenAPI/Swagger for Tool Definitions

Evaluation Standards

1. HELM (Holistic Evaluation of Language Models)

2. MMLU (Massive Multitask Language Understanding)

3. HumanEval (Code Generation)

4. TruthfulQA

Interoperability Standards

1. ONNX (Open Neural Network Exchange)

2. Hugging Face Model Hub Standard

3. LangChain Standard Components

Emerging Standards

1. OpenAI Assistants API

2. Semantic Kernel Standard Plugins

3. LangGraph State Schema

Best Practices

1. Choose Standards Based on Ecosystem

2. Version Your Schemas

3. Document Deviations

4. Use Adapters for Compatibility

5. Test Against Multiple Standards

Standard Adoption Timeline

Future Directions

Likely Developments:

Resources

Standards Organizations

Specifications

Benchmarks